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Abstract. We start by studying the distribution of (cyclically reduced) el- 
ements of the free groups F„ with respect to their abelianization (or equiv- 
alently, their class in Hi(F„,Z)). We derive an explicit generating function, 
and a limiting distribution, by means of certain results (of independent inter- 
est) on Chebyshev polynomials; we also prove that the reductions mod p {p 
— an arbitrary prime) of these classes are asymptotically equidistributed, and 
we study the deviation from equidistribution. We extend our techniques to a 
more general setting and use them to study the statistical properties of long 
cycles (and paths) on regular (directed and undirected) graphs. We return 
to the free group to study some growth functions of the number of conjugacy 
classes as a function of their cyclically reduced length. 



Introduction 

In this paper we begin by studying certain growth functions of the free group F^, 
related to well-studied questions on the growth functions of geodesies on manifolds. 
The free group is a relatively simple combinatorial object, and this allows us to 
get fairly complete answers to our questions. Our techniques, which are quite 
elementary, allow us to get precise results on the distribution of elements in Fr as 
a function of their abelianization and in terms of their abelianization mod p. Our 
techniques turn out to be easily extensible to the study of paths in graphs with 
coefficients in compact groups. 

Here is an outline of the paper: In Section |^ we set up an equivalence between 
counting cyclically reduced words on the free group Fr and counting circuits on an 
associated graph Qr, which, in turn, involves understanding the spectrum of the 
adjacency matrix of Gr (of course the answer is easily obtained, and is well-known; 



for convenience we state it as Theorem 1.1). We use this framework to obtain a 



generating function for the number of elements of a fixed cyclically reduced length 
with prescribed abelianization (or homology class) . This turns out to be essentially 



a Chebyshev poly nomial of the first kind; see Definition |2.2| of the function Rr 
and Theorem |2.3| (a very brief introduction to Chebyshev polynomials is given in 
Section ^ . The fact that the function Rr (c; x) (at least for some special values 
of the parameter c) is a combinatorial generating function implies a previously 
unnoticed positivity result on Chebyshev polynomials; this result is generalized in 
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Section ^ in Theorems 4.1 and |4.2| . Theorem is used in Section || to derive a 
limiting distribution (as n tends to infinity) of cyclically reduced words length n 
among the possible homology classes. From the analytic standpoint this is also 
a qualitative result about Chebyshev polynomials, complementing the positivity 
Theorems 4.1 and In Section ^ we show that if we study homology mod p, 
then the cyclically reduced words in Fr are asymptotically equidistributed among 
the classes in Hi{Fr, Z/pZ). We also succeed in estimating the extent to which 
the cyclically reduced words in are not equidistributed mod p (Section |6.lD . 

While the results in Sections ^ and |6| seem to depend on the explicit generating 
function that we have obtained, in Section |^ we show that our techniques are more 
general, and use them to study the equidistribution properties of long walks on 
regular graphs - we obtain a complete answer (Theorem |7.1| ) - and, without any 
change, closed orbits of irreducible primitive Markov processes (with a finite number 
of states). The arguments use elementary perturbation theory and the necessary 
technical results are contained in Section [l^. 

In Section ^ we extend our methods to study the functions defined on the edges 
of a graph, and as an application we derive the statistical properties of long walks 
without backtracking on the edges of an undirected graph. 

We apply our methods to derive equid istri bution results for long walks with 
coefficients in compact groups in Sections 7.1 and ^. Our results are completely 
explicit, in that knowing the irreducible representations of the group in question 
allows us to obtain complete asymptotics for the convergence to uniformity. Our 
results also apply, via the construction of a directed edge graph to the statistics of 
"geodesic", that is, backtrackless paths (Section This, in turn, implies a result 
on the statistical properties of "primitive" orbits of Markov processes as above. 

In Section |l^ we point out real and philosophical applications of the above men- 
tioned result to group theory (where this all started) and geometry. 

Finally, in Sections lS- 14.l| we derive a relationship between the number of cycli- 
cally reduced words and the number of conjugacy classes of bounded length. While 
the generating function of the first is a rational function, the generating function 
of the second is the integral of a Lambert series with an infinite number of poles. 
These results are then extended to a slightly more general case than that of free 
groups. We then (in Section ^5|) compute a zeta function for primitive conjugacy 
classes, and show that this is a rational function. 



1. A MODEL AND A GENERATING FUNCTION 

Let G be the free group F,. = (ai,... ,ar), and let g e G be an element. 
The defining property of G is that g is uniquely represented by a reduced word 
in fli, . . . , Ur, that is, a word where Ui is never adjacent to a^^ (Notation: in the 
sequel we shall write W for w~^). We observe that such words over the alphabet 
ai, Ai, . . . , a„. An are, in turn, be generated by walks on the graph Qr, constructed 
as follows: C/„ has 2r vertices, labelled with the symbols ai, . . . , a^, Ar, ... ,Ai - 
this peculiar order will simplify notation later. The vertex corresponding to 
is connected by an edge to every vertex except Ai. In particular, there is a loop 
joining to itself (so that Qr is not a simple graph). A walk viV2 . . .Vk gives the 
word V2 . . .Vk, so the correspondence between walks and words is a 2r — 1-to-l 
mapping. Note, however, that if we restrict our attention to closed walks (circuits 
with basepoint) on Qr, then those are in bijective correspondence with cyclically 
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reduced words in G. In the sequel we will be interested exclusively with cyclically 
reduced words. 

1.1. Counting cyclically reduced words. To count cyclically reduced words, 
then, we need to count circuits in Qr- This is a well- understood problem: If Ar is 
the adjacency matrix of Qr, then the number of circuits of length k is equal to the 
trace of A^- To compute this trace we must compute the spectrum of Ar, and to 
do this, it is better to write Ar = J2r — Pr, where Jn is an iV x iV matrix all of 
whose elements are 1 and Pr is the 2r x 2r matrix such that 



{Pr)^ 



1, ifi + j = 2r; 
0, otherwise. 



In order to compute the spectrum of Ar , we note first that the matrix J2r has rank 
1. The kernel of J2r is 

2r 

ker J2r = {{Vl, ■ ■ ■ ,V2r) \ 0}, 

while the vector 1 = (1, . . . , 1) is the eigenvector of eigenvalue 2r. 

The spectrum of P,- is not much more difficult to compute: The vector 1 is the 
eigenvector of Pr as well as of J2r, this time with eigenvalue 1. To compute the rest 
of the spectral decomposition, let x be an eigenvector of Pr orthogonal to 1, and 
let A be the corresponding eigenvalue. Then we have the following set of equations: 

(1) E^^ = o 

Xj = Xx2r-j + l, j = 1, . . . 2r. 
Since at least one of the xj is not equal to zero, we see that = 1, so A = ±1. 
The orthogonality condition Eq. can be rewritten as + — ^■ 

Suppose A = — 1. Then, Eq. (|^) holds a forteriori, and so the eigenspace of of 
— 1 is r-dimensional. On the other hand, if A = 1, then we have the additional 
constraint that X]j=i — ^^ the eigenspace of 1 is n — 1 dimensional. Putting 
this all together, we see that the spectrum of the adjacency matrix Ar is (2r — 
1,1,... , 1,-1, . . . , —1). We see therefore: 



Theorem 1.1. The number of cyclically reduced words of length m in Fr is equal 
to (2r - 1)™ + 1 + (r - 1)[1 + (-1)"]. 



2. Counting cyclically reduced words in homology classes 

Recall that the abelianization of F,. is , generated by the classes of [oi], . . . , [ar] 
of ai , . . . , a,- respectively. To compute the homology class of a word w in Fr we 
simply count the total exponents ei{w), . . . , er{w) of the generators used to write 
w. Then, [w] = ei(w)[ai] + • • • + er{w)[ar]. In this section we will compute the 
following generating function: 

Hi'\x„...,xr)^ y: f[-f''\ 
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where the sum is taken over the set Wk of aU cychcally reduced words w in 
ai , . . . ,ar,Ai,... , Ar of length k. 

(k) 

To compute Tir , we return to circuits in Qr- Given a circuit c = vi, . . . ,Vk, Vk+i = 

(k) 

vi , the contribution of c to Tir is the monomial TOc given by the following iterative 
procedure: we start with 1, every time we see the vertex ai, we multiply rric by Xi, 
and every time we see Ai, we multiply rric by 1/xi. From this, it follows that: 

Theorem 2.1. The Laurent polynomial T-6j^^ is given by tr , where Br = DrAr, 
where, in turn, 



Dr = 



\ 



1/xJ 



Computing the trace of B^ se ems daunting at first, but one can use the approach 
we have used to prove Theorem 



1.1 



First, note that 

Bf = DfAr — DrJ2r — Dj'Pj-- 

Evidently, the rank of DrJ2r is still equal to 1, and 

2r 

keTDrJ2r = {v = (wi, . . . , V2r) \^^3 = 0}- 

J = l 

Note further that an eigenvector v of DrPr, such that v £ kerDrJ2r-, with 
associated eigenvalue A, is also an eigenvector of Br, with associated eigenvalue 
—A. To find such an eigenvector, we must solve the system of equations: 



2r 

XVj ^V2r-j+l/Xj, j<r 
XVj =V2r-j + lXj, j>r. 

We find, as before, that A = ±1. The first equation reduces to (almost as before) 
to 

r 

so that the eigenspaces of both 1 and —1 are (r — l)-dimensional. What are the 
two remaining eigenvalues fii and /i2 of Bri Note that since detl^r = 1, we 
know that det Br = detAr- Note now that detS^ = /ii/i2(— l)*^"^, while detAr = 
(2r- l)(-l)'-i. So 

(2) ^Ml^^2^2r-l. 
On the other hand. 



(3) 



^ 1 
Ml + M2 = tr Br = ^^{xj H )■ 
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Denoting — + l/^^jOi we see that /Lti,/X2 are the two roots of the 

equation — 2yrZ + (2r — 1) = 0, so that: 



Ml = 2/r - \Jvi - (2r - 1), 
A*2 = y,- + Vj/r - (2?- - !)• 

The trace of is then equal to + (r + {—\)^\. This can be expressed in 

terms of well known special functions, if we make the substitution = \/2r — ly'^. 
Then, 

^{2r-lf/'{y[.-,fi^^\ 

(2r - If/' I [y'r ^V'r^ - l) ' + [v'r + 



and so 



= 2(V2F^)'^-Tfc(z/;), 

where Tk{x) is the fc-th Chebyshev polynomial of the first kind. To simplify notation 
in the sequel, we define: 

Definition 2.2. 

S'„(c; xi, . . . , Xfe) = Un y^,— 1 ^ 
And to summarize: 

Theorem 2.3. The number of cyclically reduced words of length k in Fr homolo- 
gous to ei[ai] + • • • + er[ar] is equal to the coefficient of x'^ ■ ■ ■ x'^^ in 

(4) 2 (V2f^' i?,(^=^; XI, . . . , X.) + (r - 1)[1 + (-l)^] 

V2r — 1 

Remark. The rescaled Chebyshev polynomial Tk{ax)/a^ is called the k-ih. Dickson 
polynomial Tfe(x, a) (see | Schur73| ). 

3. Some facts about Chebyshev polynomials 



The literature on Chebyshev polynomials is enormous; |Rivlin90| is a good to 
start. Here, we shall supply the barest essentials in an effort to keep this paper 
self-contained. 

There are a number of ways to define Chebyshev polynomials (almost as many as 
there are of spelling their inventor's name). A standard definition of the Chebyshev 
polynomial of the first kind Tn (x) is: 



(5) Tn{x) ~ cosnarccosx. 
In particular, ro(x) = 1, Ti(x) = x. Using the identity 

(6) cos(x + y) + cos(x — y) = 2 cosx cosy 
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we immediately find the three-term recurrence for Chebyshev polynomials: 

(7) Tn+i{x) 2xTn{x) - T„_i(x). 

The definition of Eq. (|) can be used to give a "closed form" used in Section H 

(8) Tn{x) ^\[(x^ - l)" + (x + ^Jx-^ - l)" . 

Indeed, let x ~ cosO. then [x — \/ x'^ — 1) = cxp{—in9), while (a; + \/a;^ — l) = 
exp{in9), so ^ {x ~ ^ x^ — l)" + (a; + Va;-^ — l)" = 3? exp(i7i6') = cosn9. 

Though we will not have too many occasions to use them, we also define Cheby- 
shev polynomials of the second kind Un{x), which can again be defined in a number 
of ways, one of which is: 

(9) Unix) = 

A simple manipulation shows that if we set x = cos 0, as before, then 

nn\ TT ( \ sin(n + 1)6> 

(10) Vn\x) = — . 

sm f/ 

In some ways, Schur's notation Un = Vn-\ is preferable. In any case, we have 
Uq(x) = 1, U\[x) = 2x, and otherwise the Un satisfy the same recurrence as the 
T„, to wit, 

(11) Un+l{x) = 2xUn{x) - Un-l{x). 

From the recurrences, it is clear that for f = T,U, /„(— x) — (— l)"/(a;), or, in 
other words, every second coefficient of Tn{x) and Un{x) vanishes. The remaining 



(n) 

n—2m 



coefficients alternate in sign; here is the explicit formula for the coefficient c, 
of of T„(x) : 

12 C„_2m = -1 2 , TO = 0,1,..., - 

n ~ m \ m / 12 
This can be proved easily using Eq. (|^). 

4. Analysis of the functions i?„ and 5„. 

In view of the alternation of the coefhcients, the appearance of the Chebyshev 
polynomials as generating functions in Section ^ seems a bit surprising, since com- 
binatorial generating functions have non-negative coefficients. Below we state and 
prove a generalization. Remarkably, Theorems and 4.2 do not seem to have 
been previously noted. 

Theorem 4.1. Let c > 1. Then all the coefficients of Rn{c;x) are non-negative. 
Indeed the coefficients of x" , x"^^ , . . . ,x^"^'^,x^" are positive, while the other co- 
efficients are zero. The same is true of Sn in place of Rn. 

Proof. Let be the coefficient of x'^ in [/„((c/2)(a; -t- l/x)). The recurrence gives 
the following recurrence for the aj^ : 

(13) a'^+,^c{a^-'+a^+')-a^^,. 

Now we shall show that the following always holds: 

(a) : > (inequality being strict if and only if n — fc is even). 

(b) : > max(ajj~^, o^^^), the inequality strict, again, if and only if n — fc is 
even. 
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(c): ajj > ajj_2 (strictness as above). 

The proof proceeds routinely by induction; first the induction step (we assume 
throughout that n — k is even; aU the quantities involved are obviously otherwise): 



By induction a„_]^ < min(a„ ,a„ ), so by the recurrence 13 it follows that 



aj^,),]^ > max(oJj ^,afj+^). (a) and (c) follow immediately. 

For the base case, we note that Oq = 1, while aj = a^^ ~ c > 1, and so the result 
for Un follows. Notice that the above proof does not work for r„, since the base 
case fails. Indeed, if b^^ is the coefhcient of x'^ in Tn{{c/2){x + l/x)), then 6g — 1, 
while b\ = c/2, not necessarily bigger than one. However, we can use the result for 
Un, together with the observation (which follows easily from the addition formula 
for sin) that 

/-, .X ^ / N Unjx) - Un-2ix) 

(14) T„[x) = . 

Eq. (|l^ imphes that bf^ = - aj;"^ > 0, by (c) above. □ 

The proof above goes through almost verbatim to show: 

Theorem 4.2. Let c > 1. Then all the coefficients of Rn are non-negative. The 
same is true of 5„ in place of R„ 

To complete the picture, we note that: 

Theorem 4.3. 

^"(1;^) = ^^" + ^ 

Proof Let x = ex^piO. Then l/2(x + l/x) = cos0, and i?„(l;a;) = T„{l/2{x + 
l/x)) = cosn9 ^l/2{x" + l/x"). □ 

Remark 4.4. For c < —1 it is true that all the coefficients of Rn{c; .) and Sn{c; .) 
have the same sign, but the sign is (—1)". For \c\ < 1, the result is completely false. 
For c imaginary, the result is true. I am not sure what happens for general complex 
c. 



By the formula (12[), we can write 



/ i\\il-^J / \ / i \ n — 2m 



(15) T„(^-(^.+-jj.-5:(-i. ^ 

^ ^ ' ' m=0 ^ ' ^ 



x + - 



i\' 



Noting that 

(16) + =E 

we obtain the expansion 

n (n — m\ ( n ~ 2m 



(17) R„(c;x) = c- 

k=-n m=0 ^ ^ 



n — m\ m y \(n — 2to — fc)/2 



where it is understood that (^) is if 6 < 0, or > a, or 6 <^ Z. We shall denote 
the coefficient of x'' by t{n, k, c). 
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5. Limiting distribution of coefficients 



While the formula is completely explicit, and a similar (though somewhat 
more cumbersome) expression could be obtained for i?„(c; xi, . . . , Xfc), for many 
purposes it is more useful to have a limiting distribution formula as given by 
Theorem 5.1 below. To set up the framework, we note that since all the coeffi- 
cients of i?n(c; xi, . . . , Xk) are non-negative (according to Theorem 4.2), they can 
be thought of defining a probability distribution on the integer lattice Z% defined by 
p{li, . . . ,lk) — [x''ix''2 ■ ■ ■ x|^]i?(c; xi, . . . , Xk)/R{c; 1, . . . , 1) (where the square brack- 
ets mean that we are extracting the coefficients of the bracketed monomial). Call 
the resulting probability distribution Vn{c; z), where z now denotes a fc-dimensional 
vector. 

Theorem 5.1. With notation as above, when c > 1, the probability distributions 
Vn{c;z/y/n) converge to a normal distribution on iV' , whose mean is 0, and whose 
covariance matrix C is diagonal, with entries 

1/21 

1 



" =k 



1 



To prove Theorem 5A we will use the method of characteristic fu nctions ( Fourier 
transforms), and more specifically at first the Continuity Theorem ([ Fellcrll , Chap- 
ter XV. 3, Theorem 2]), 

Theorem 5.2. In order that a sequence {F„} of probability distributions converges 
properly to a probability distribution F, it is necessary and sufficient that the se- 
quence {4>n} of their characteristic functions converges pointwise to a limit <j), and 
that (j) is continuous in some neighborhood of the origin. 

In this case is the characteristic function of F. (Hence (j) is continuous every- 
where and the convergence <j)n ^ 4> is uniform on compact sets). 



The characteristic function of 7^„(c; z) is simply 

i?„(c; exp(i6'i), . . . , exp(i6'fc))/i?(c; 1, . 
By definition of R„, 



3 ' 



i?„(c;exp(i6'i), . . . ,exp(j6ife)) = (f I]j=i cos( 

i?„(c; 1, . . . , 1)) = r„ (f Y!1=i coso) = T„(c). 
We now use the form of Eq. (g): 



Tn{x) = \[(x- ^x-^ - l) " + + yjx-^ - l)") , 



setting 

and 
we get 



cos — p= 



n 



e^{eu...,0k), 
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(18) (/)„(0/V^) = 



1 



1 / c 



r„(c) 2 \ fc 



fc2 



Mr 



Notice, however, that for c > 1, the ratio of the second term in braces to the first 
is exponentially small as n — * oo, since the first term grows like (c + \J (? — 1)", 
while the second as (c — \J (? — 1)" (since cos — > 1). Since, for the same reason, 
2T„(c) = (c + - 1)"[1 + o(l)], we can write: 



o(l). 



Substituting the Taylor expansions for the cosine terms (hidden in u for typesetting 
reasons), we get: 



(19) 

so 

(20) 



u = fc+;^(6»,6>) + o(l/n). 



lu = c+^{9,e)+o{l/n). 



A similar computation gives 



(21) 



(0,0)+o(l/n) 



Substituting the last expansion into the square root, we see that 



1 



■Vc 



l^ 1 



1 



1 



2n 



Adding Eq. ( pO| ) and collecting terms, get 
(22) 

1 



1 



c+ V? 



1 



1 



2n 



1 



1 



c + y/c 



1 



Performing some further simplifications, we see that 



0(1), 



where C is the covariance matrix described in the statement of Theorem 5.1, and 



Theorem 5.1 follows immediately. 



Remark 5.3. The speed of conv ergence in The orem 5.1 can be estimated using 
standard technology (see [ Fellerll , Chapter XVI], | Shiryaev96 , Chapter III. 11]), but 
the speed of convergence in practice (as checked by numerical experiments) seems 
to be much better than the general estimates. Indeed the difference between Vn 
and the normal distribution appears to decrease almost exactly linearly in n. 



10 



IGOR RIVIN 



6. Distribution mod p 

The explicit generating functions derived above can be used to study the dis- 
tribution of cychcally reduced words in Fr with respect to their mod p-homology 



class (this is the analogue, in this setting, of the work of |PS87|) 



Theorem 6.1. Let hi and /12 be two elements of Hi{Fr,Z/pZ) — Z/pZ^ , and let 

Wr,n,hi and Wr.nM be the numbers of cyclically reduced words in Fr homologous 
to hi and h2, respectively. Then, 

(23) lim = 1. 

Proof. By elementary algebra (in one dimension, formula (p5|), the statement of 
theorem is equivalent to the statement that 

for 6 = (2ni7r/p, . . . , 2nr7r/p), with not all Uj equal to mod p, where 0„ is the 
characteristic function defined in the previous section. 

The estimate of Eq. (p^), however, follows immediately from the explicit formula 
indeed, in the current context, 



fc 

u{6) — cos(2n.,-7r/p), 

which is strictly smaller than u{0), so the ratio of 4>n{d) to (j>n{0) goes to zero 
exponentially fast in n. □ 

6.1. Deviation from uniformity. Although the distribution of homology mod p 
approaches uniformity, it turns out that there is a persistent 6ms in favor of cer- 
tain homology classes. This is very much akin to the Chebyshev bias, analyzed in 
|RS94 |. To simplify the discussion we project one more time: for each cyclically 



reduced word in Fr homologous to a^^Uj^ . . . a^'' we consider ki + ■ ■ ■ + kr mod p. 
In this case we have a univariate distribution, whose generating function is given 
by ipn{x) — Rn{c;x, . . . ,x), with c = (as per formula (^; we leave in the 

general c, to underline that our results apply to general question on distribution of 
coefficients of the Laurent polynomials -R„). 

The number of elements congruent to q mod p is given by 

1 

where x = exp(27ri/p) is a primitive p-th root of unity. Let us recall that 
(26) 



XT { ^ "^^^ ^ ^ \/ cos^ X — 1^ ^ ^ ("^ ^ ~ \/ cos^ a; — 1^ | 



Note the following properties of the function ■(/'„: 

(a) : i)n{l/x) = ipn{x), 

(b) : If ccosx < 1, then |'0„(exp(ia;))|T„(c) < 1. 

(c) : i>n {exp(i(7r - x))} = (-1)"^" {exp(ia;)} 

(d) : If ccosx > 1, then '0„(exp(za::)) > 0. 
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(e) : If a; e [0, arccos 1/c], and n tends to infinity, then 
.ov^ |V'„(exp(zx))|T„(c) 

[c + \J& COS^ X — IJ 

and so 

(f) : '0„(exp(ia;i)) = o(V't! (exp(ia;2)) for < X2 < arccos 1/c, and xi < x\ < 
n — X2. 

Using property (a), we can write 

2'Kqj 



(28) Mn^q - i 



V'4l) + 2^cos^V„(x^' 



i=i ^ 



Since cos < 1 is monotonicaUy decreasing as a function of m for < m < 
p-i 

we see: 

Theorem 6.2. for sufficiently large even n, Afn.q < J^nfl- 



Proof. This is an immediate consequence of the monotonicity of cos, equation ( 28 ) 



and properties (a), (c), (d), and (f) above. □ 
For g 7^ mod p, the term largest in absolute value in the sum (aside the 



'(/'ri(l) term) on the right hand side of eq. (28) is the V'(X^~) term, so if we 
assume that n is even, then the next largest (after Mnfi) term will be Afn,p-2 (since 
{p — 2)[{p — l)/2] — 1 mod p), then N'n,p-4, and so on. For n odd, the ordering is 
reversed. 

7. An extension and limiting distributions for graphs 



An inspection of the proof of Theorem 5.1 reveals that in order to show that for a 



sequence of probability distributions {Pn{x)} on Z, the distributions {Pn{x/^/n)} 
converged to a limiting normal distribution with mean 0, we used the following 
conditions (we will state them in a univariate setting for simplicity; the multivariate 
case is the same): 

Condition 1. The characteristic function of {P„} has the form 

X{Pn)^n9)+o{l), 

where fj{0) is twice continuously differentiable at 0, so that fj{0) = aj + bjO + 
Cj9^ + o{9^). 

Condition 2. 

ai = 1, 62 = 0, C2 < 0. 
Suppose now we generalize the setting of Section |l| as follows: 

Let Q he a, connected r-regular non-bipartite graph, directed or not, (possibly 
with self- loops and multiple edges), on k vertices. Let vi and V2 be two vertices of 
G- Consider now the set Wn of all closed walks (circuits) of length N on Q. Let 
i : V{G) ^ R he a function assigning a weight to each vertex of Q, and define a 
random variable X[ to be J2i=i fo^' w — vi, . . . ,vn G Wn. What can we say 
about the distribution of Xf7 It turns out that asymptotically we can say a lot. 
First, however, define 



12 



IGOR RIVIN 



and fo = f — ^(f)l. Define further tlie Laplacian A{Q) of G to be A{Q) = rI — A{Q), 
and define Ao{G) to be A{G) viewed as an operator on the orthogonal complement 
to 1 (that is, vectors with sum). Let Pn{x) be the distribution of Xf on Wn- 



Theorem 7.1. The distributions PnUx — Nfi{f))/\/N) converge to a balanced 
( that is, mean ) normal distribution with variance 



(29) 



-^(f) = i[-||fo|P 



^2rf*A„i(c;)fo] 
2rA^\g))io] . 



Proof. Exactly as in Section |l| we construct a generating function for Xf on 
Wn- To do this, let A be the adjacency matrix of Q, and let 



Dk{x) 



\ 



Then, gN{x) = tr (-Dfc(x)A)^ = Y!1=i \f{Dk{x)A), where Ai, 



, \j are eigenval- 



ues, and, just as in Section]^, we have x{Pn){0) = (7Af(exp(i6'))/civ, where 



CAT = |W^Jv| = 



(A). 



Since Q is an r =regular, non-bipartite graph, it has a unique eigenvalue of maximal 
modulus, and that eigenvalue is Ai — r. 

Now, we can directly apply Conditions 1 and 2 (and accompanying comments) 
above, and the results of Section |l^ (noting that Assumptions 1-4 hold) to obtain 
the desired result (in particular, the estimate needed in Condition 2 is precisely 
Theorem 10. S| ). We replaced the resolvent in formula (54) by the equivalent (by 
the discussion in the beginning of Section p!o| ) Laplacian form, since that is more 
common in graph theory. □ 

Remark 7.2. // the vector f is an eigenvector of A^A with eigenvalue r^, the 
corresponding variance is equal to zero. By Remark lO.i this will not happen, eg, 
if G is a connected non-bipartite undirected graph, but it does happen for general 
directed graphs; see the discussion of the directed line graph in Section |^. 

The above remark leads to the following 

Question: What combinatorial property of an r-regular directed graph G is re- 
flected in the algebraic statement that the operator norm of Ao{G) is equal to 
r? 



A slight change in notation transforms Theorem 7.1 into a central limit theorem 
for distributions over closed orbits of primitive irreducible Markov processes over a 
finite number of states - the irreducibilty is exactly equivalent to the connectivity 
of the graph Q above. For ease of reference we state this as a separate theorem. 
The notation for f , /x, etc, is as before; the space Wn is now a probability space 
with the obvious probability measure; P = P* is the transition matrix (note that 



Remark 7.2 remains valid in this setting as well). 
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Theorem 7.3. LetPN{x) he the distribution of Xf ouWn- Then Pn{{x-N ^l{^))/^/N) 
converge to a balanced ( that is, mean ) normal distribution with variance 

(30) a2(f) = i [-||fo||2 + 2f*(Io - Po)-'fo] = \ [fo(-Io + 2r(Io - Po)-')fo] • 

Remark 7.4. We have actually shown a slightly stronger result: instead of the 
trace (distribution over cycles), we could have considered the ij-th element of 
P. Sinc e the principal eigenvector varies continuously under perturbations (see 
|Kato66 , Chapter II.4.1]), we could have replaced our sample space Wn as above 



by the space Cn oi paths of length N joining the i-th to the j-th vertex. An easy 
computation shows that the covariance is the covariance of given in equation ^ 
divided by a further factor of k. The same remark applies to Theorem 



7.1 



7.1. Distribution modulo a prime. Theorems [7.1| and |7.3| have particularly 
simple analogues if the function / we are studying is integer valued, and we are 
interested in the distribution of the Z/pZ-valued random variable Yf{n) which 
assigns to each cycle of length n the sum of the values of / modulo p. In that case, 
under the assumption that the adjacency matrix A (in the context of Theorem 



7.1) or the transition matrix A (in the context of Theorem 7.3) is irreducible and 
primitive (the last twoA{Cu{G)) conditions guarantee that A has a single eigenvalue 
Aq of maximal modulus, the eigenspace of Aq is one-dimensional, and the orthogonal 
subspace is invariant under A), then we see that the distributions Vn of Yf{n) 
approach the uniform distribution (on 'Z/pTj) exponentially fast in n (though a 
more reasonable measure of the speed of convergence is the size of Wn, in which 
case the convergence is polynomial). This statement follows from the: 

Lemma 7.5. If A is a matrix satisfying the conditions above, then the spectral 
radius rjjA ofUA, for U any non-trivial unitary matrix such that the top eigenvector 
of A is not also an eigenvector of U , is strictly smaller than that of A (rA)- 

The proof of the lemma is immediate. 

In our case, the matrix U is the diagonal matrix U{x) with Ujj = Xv , with Xv ^ 
non-trivial p-th root of unity. The speed of convergence to the uniform distribution 
is given by (max^p^i r{\J{j()A))lr{A). 

8. Functions on edges and distributions over paths without 

backtracking 

In this section we consider two kinds of questions, which are seen to be intimately 
related. The first is: 

Question 1. Let / be a function on the edges of G. How are the averages of / 
over long cycles or paths in G distributed? 
The second sort of question is: 

Question 2. Let / be a function on the vertices of G. How are the averages of / 
distributed over long cycles in G without backtracking — such cycles are more closely 
related to, eg, geodesies on surfaces, then arbitrary cycles. 

Both questions can be answered at the same time by constructing the directed 
line graph (or line digraph) of G. This construction can be performed for either 



a directed or undirected graph G; In section S.l we will derive the results for 



undirected graphs in detail, whilst in section B.3 we will discuss the directed case 



somewhat more briefly (since the technical details are essentially identical). 
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8.1. The directed line graph of an undirected graph. The directed line graph 
of G, denoted by C{G), is constructed as follows: The vertices of C{G) are edges of 
G labelled with a + or a — ; that is, to each edge e of G there correspond vertices 
e_ and e+ of C{G). These correspond to the two possible orientations of e: if the 
vertices of e are v and w, then we say that v is the head of e_, and w the tail 
(and write v — /i(e_), w — i(e_)), while for e+ this nomenclature is reversed. Two 
vertices vi and of C{G) are joined by a (directed) edge if the head of vi is the 
same as the tail of V2, except that e_ is never joined to 6+, and vice versa. We now 
make some observations and definitions. 

Definition 8.1. Let f he a function defined on the vertices of a graph G. We say 
that a function g defined on the vertices of C{G) is the gradient of f, and write 
g^yf ifg{e) = f{h{e))-f(t{e)). 

Definition 8.2. We can identify functions on the vertices of G with (a subset of) 
functions on the the vertices of C{G). To wit, if a f is a function on the vertices 
ofG,weletCf{e)^f{h{e)). 

Observation 8.3. There is a natural correspondence between walks on C{G) and 
walks on G without backtracking. Indeed, passing through a vertex e of C{G) corre- 
sponds to going from t(e) to h{e). Since e+ is not connected to e_ for any e G E[G), 
any such walk is automatically without backtracking. Similarly, a cycle on C{G) 
corresponds to a tailless cycle without backtracking on G. 

If G is an r-regular graph, then C{G) is r — 1-regular, in the strong sense: each 
vertex of £(G) has in-degree and out-degree equal to r — 1 (thus the total degree 
is 2r — 2), and from the above Observation ^.3| , C{G) is connected if and only 
if G is. It follows that the adjacency matrix A{C{G)) of C{G) is an irreducible 
nonnegative matrix, all of whose row and column sums are equal to r — 1 . It follows 
that the space of functions on the vertices of £(G) orthogonal to the vector 1 is an 
invariant subspace of A{C{G)) and of A*{C{G)) - we will, as before, denote the two 
matrices restricted to this subspace by and Aq, respectively; the algebraic and 
geometric multiplicities of the eigenvalue r — 1 is equal to 1, by standard Perron- 
Frobenius theory. Despite this, it turns out that A* A is spectacularly degenerate. 
Indeed, the ij-th entry of A* A is equal to the number of vertices of C{G) adjacent 
simultaneously to the j-th and the j-th vertex. It follows that the ii-ih entry of 
A^ A is equal to r — 1, while the ij-th entry is equal to r — 2 if the corresponding 
directed edges of G have the same tail, and is otherwise. It follows that 

(Ji \ 

(31) A'A = hE(G) + {r-2) 

\ Jv(G)) 

where the last term contains V{G) r x r blocks, each of which is the matrix of all 
Is. We thus have the following observation: 

Observation 8.4. The spectrum of A* A has the following form: The eigenvalue 
(r — 1)^ occurs V(G) times, and the corresponding eigenvectors are given precisely 
by Cf for arbitrary functions f on G (the Perron eigenvector corresponding to 
the constant function), while the eigenvalue 1 occurs 2E(G) — V{G) times. The 
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eigenvectors are those Junctions on the directed edges of G, for which, for all vertices 
V of G, the sum of values on all the edges leaving v is equal to 0. 

Corollary 8.5. The operator norm of Aq is equal to r — 1. 

Consider now the Laplace operator on £(G): Ac{g) — {t ^ 1)^ ~ ^('^(G)). We 
will need the following in the sequel: 

Theorem 8.6. Let Er-i be the eigenspace of (r — 1)^ for A* A. If V*{G) is the 
space of functions on the vertices of G, then 

(a) : 

Er-l^C{V*{G)), 

(b) : 

Ac^G){Er-l)=^{V*{G)), 

(c) : V(V^*(G)) n Er-i n 1-L = 0, unless G is bipartite. 



Proof. Part (a) is the content of Observation 8.4. Part (b) is a corollary of Part 
(a). Indeed, A^(G)(/)(a;) = - l)/(x) - En^j^tiy) fiv)- " / = ^9, then 
(32) AciG)if){x) = (r ~ l)(.9(t(x)) - gihix))), 

since all the y adjacent to x have the same tail, equal to the head of x. 

To show Part(c), suppose V(F*(G)) fl E'r-i 7^ 0- Let g be in the intersection, 
and k be such that V(A;) = g. It follows that for any x, y such that t{x) = t{y), 
g{x) = g{y). We see that k{h{x)) — k{t{x)) = k{h{y)) — k{t{y)), which imphes in 
turn that k{h{x)) = k{h{y)). So, k is the eigenvector of the eigenvalue of the 
Laplace operator on G, and hence is constant, unless G is bipartite. □ 

We end this section with a remark necessary to compute distributions, as done 



in the following Section 8.2. To wit 



Remark 8.7. The adjacency matrix of the line graph of a non-bipartite graph G 
is primitive. That is, there is only one eigenvalue on the circle of radius r — 1 in 
the complex plane, and that is r — 1. Its geometric multiplicity is 1. 

Proof. Doubtl essly there are simpler arguments, but we choose to use the results 
(described in [^T96[| ) on the Ihara zeta function Z of G, which can be expressed as 
a determinant in two ways: 



The first way (original theorem of Ihara |Ihara66|) is: 



(33) Z-\u) = (1 - u2)K-i det((l + (r - l)^^)! _ uA), 

with A the adjacency matrix of G, and TZ the rank of the fundamental group of G. 



The second way (due to Hyman Bass |Bass92| is 



(34) Z-^{u) ^det{I-uM), 

where M is the adjacency matrix of the directed line graph of G 

The equality of the two expressions implies that v is an eigenvalue of M if 
and only if w + (r — l)/ti is an eigenvalue of A (we are ignoring the eigenvalues 
±1, which occur with large multiplicity in the spectrum of M). Suppose that 
v has modulus r — 1, so that v — {r ~ l)exp{i9), for some 9. It follows that 
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□ 



w — exjp{i6) + (r — 1) exp(— i0) is an eigenvalue of A, and since A is symmetric, 
9 € {0, tt}. If 9 = 0, V — r ~ 1, while ii 9 = v = — (r — 1), but then w — ~r is an 
eigenvalue of A, and so G is bipartite. 

The statement about the multiplicity of the eigenvalue r — 1 is immediate, since 
C{G) is clearly strongly connected. □ 

We include the following observations both for the sake of completeness, and in 
view of Lemma B.12 below. 

Lemma 8.8. 

AC = (r- 1)V. 

Proof. Indeed, C{f){x) = f{t{x)). Further, 
(35) 

AL{f){x)^ J2 /(t(x)-/(%)) = (r-l)(/(t(x))-/(Mx)) = V(/)(x). 

t{y)=h{x) 

Lemma 8.9. For any f,g E V*{G), we have 

(CfYVg = fAg. 

Proof. Indeed, 

{CfYVg ^ - .9(Ma:))) 

X 

= E E fi^)9iv)~fiv)gH 

^3g^ v£V{G) w adjacent to v 

vev(G) 

□ 

Consider now a function g on the directed edges of G. How do we decompose it 
into a gradient and a function orthogonal to gradients? First, we note that a basis 
of the gradients is formed by the gradients of S functions: 



1 X = V, 

otherwise. 



(37) <5„(x) 
So that 

r 1 t{x) ^ V, 

(38) Vdy{x) = l-1 h{x)^v, 

[ otherwise. 

The functions WS^ form a basis of W {V* (G)) , though not an orthonormal one. 
Now, note that 



t{x)—v h{y)—v 
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In other words, 

Lemma 8.10. g is orthogonal to the gradients, if and only if the sum of g over the 
edges coming into any vertex v is equal to the sum of g over the edges leaving v. 
An equivalent condition is that V*5 — 0. 

One may ask: what is the orthogonal projection of a given Cf onto the gradients? 
The foUowing conies out of an easy computation: 

Observation 8.11. The orthogonal projection of Cf onto the set of gradients is 
VA/. 

8.2. Applications to distribution. We can use the resuhs of the previous section 
to understand the limiting distribution of functions defined on (directed) edges of 



G. Indeed, we can use Theorem 7.1 in the form corresponding to Eq. to observe 
that 

(39) <j'{i) = ^f*(A„-i)*((r - 1)^1 - AiCiG)rAiCiG))A,'{ 

for f any function on the directed edges of G, and Aq the restriction of the Laplace 
operator on £{G) to the subspace of 0-sum vectors. 

Lemma 8.12. The right hand sidef of equation vanishes precisely when f is the 
gradient of a function on the vertices of G. 



Proof. Let f — An. By Observation B.4 we see that the right hand side of Eq. |3£ 



vanishes precisely if u e C(V*{G)). By part (b) of Theorem 8.6 it follows that this 



is so if and only if f G V (F* (G) ) . □ 

One direction of the above lemma is just common sense, since the sum over any 
cycle of a gradient is equal to 0. 

Keeping the above in mind, we note that a simpler form of the covariance is 



given by Theorem 7.1 



(40) a^(f) = ^[f*(l-2(r-l)Ao-i)f] 

For functions on the vertices of G, the above assumes the form: 

(41) a'{{) ^ ^ [f*£* (I - 2(r - 1)A^') Ct] 

8.3. The line graph of a directed graph. The construction of the line graph of 
a directed graph G is essentially the same as that of an undirected graph. This time, 
the vertices of >C(G) without labels (so C{G) has E{G) vertices). The operators V 
and C are defined as in Section ^.l| . We have an observation even simpler than 
Observation 



Observation 8.13. There is a natural bijective correspondence between walks on 
C{G) and walks on G. 

If G is an r-regular directed graph (by this we mean that both the in- and out- 
degree of each vertex is equal to r), then so is C{G); by Observation 8.13| C{G) is 



connected whenever G is. As before, A{C{G)) is the adjacency matrix of C{G). we 
can compute: 
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(42) 



/Ji 



V 



J2 



Jv(G)J 

where each block corresponds to the set of edges of G emanating from a given 
vertex. From this we have: 

Observation 8.14. The spectrum of A*{C{G))A{C{G)) has the following form: 
The eigenvalue r^ occurs V{G) times, and the corresponding eigenvectors are given 
by Cf for arbitrary functions f on G (The Perron eigenvector corresonding to 
the constant function) while the eigenvalue occurs E{G) — V{G) times. The 
eigenvectors are those functions on the edges of G for which the sums of the values 
over all edges leaving a vertex v is egual to (for all v). 

Corollary 8.15. The operator norm of Aq{C{G)) is equal to r. 

The Laplace operator on C{G) is defined as: Ac(g) ^ rl — A{C{G)). 
We have 

Theorem 8.16. Let Er be the eigenspace of r^ for A* A. IfV*{G) is the space of 
functions on the vertices of G, then 

(a) : 

Er=C{V*iG)), 

(b) : 

A£(G)(i?.) = V(F*(G)), 

We also include 

Remark 8.17. The adjacency matrix of the line graph of G is primitive if the 
adjacency matrix of G is. 



Proof. We use Observation 3.13 and Theorem 15.1 to note that the non-zero eigen- 



values of G are exactly the same as those of C{G), since det(/ — uA{G)) = dct(/ — 
uA(C{G))). □ 



Lemma 8.18. 



A£ = r-V. 



Lemma 8.19. For any f,g E V*{G), we have 

{CfY\7g = fAg 

Lemma 



8.1c and Observation 



.11 



go through without change. 
The results of section fe^ go through essentially without change. Since some 
constants change we restate them here. First, let / be a function defined on the 
edges of C{G). We see that: 



(43) 



2rk 



fiA-'Yr'l ~ A{CiG)yA{CiG))A-'f 



Lemma 8.12| holds as well, and this gives us the following useful corollary (a 
homological condition) about distribution on G itself: 
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Theorem 8.20. The variance of a function f on the vertices of G vanishes, pre- 
cisely when there exists a function g, such that Cf — Vg. 

Finally, we have a version of formula 
(44) a2(f) = ^[f*(l-2rAo-i)f] 



9. Distribution in compact groups 

The methods of the section can be adapted to the following setting: Let 
G is a graph, and T be a compact topological group. Label the i-th vertex of 
G with ti G T. Now, associate to each cycle c = wi,...,^^ on G the element 

tc — tk ti £ T. We ask: as c varies over the cycle space Wjy, how are the 

elements tc distributed in T (with respect to the Haar measure). The answer is 
given by the following: 

Theorem 9.1. // the graph G is as before (connected, non-bipartite), the closed 
subgroup generated by the ti (i = I,. . .,k) is equal to G, and the elements ti do 
not all lie in the same coset with respect to a one- dimensional representation of G, 
then the elements tc become equidistributed, as TV —> oo. 

Proof. As before, the equidistribution is equivalent to the assertion that for a non- 
trivial irreducible unitary representation p, 

(45) ^ tr(p(t,))=o(|W^„|). 



This follows from the Fourier transform formula for compact groups; see [FH91| for 



the finite case, [ |Weil5l| for the general compact topological group case. Now, let 
U{p) be the kdegp x fcdegp block-diagonal matrix whose j-th block is just p{tj). 
Further more, as before, let A{G) be the adjacency matrix of G, and Ai{G) = 
A{G) (8) 1; (where 1; is the I x / diagonal matrix: in other words, Ai{G) is a, kl x kl 
matrix, obtained from A{G) by replacing each element by a A; x A; matrix My, 
all of whose elements are equal to a^ . It is not hard to see that the left hand 
side of Eq. ^ is equal to tr {U{p)Adcgp{G))^ , and so it suffices to show that the 
spectral radius of Mp = ?7(p)^degp(G) is strictly smaller than the spectral radius of 
A{G) (which we normalize to be equal to 1 by scaling) under the hypotheses of the 
theorem. Suppose not. Since {U{p)) is unitary, the worst that can happen is that 
there exists a unit vector v, such that ||Mp(u)|| = 1. If that is so, v is contained 
in the eigenspace of eigenvalue 1 of Adegp. In such a case, v = vi u, where 
u £ V{p), and vi is an eigenvector of A{G) with eigenvalue 1. If vi — {vl, . . . , w"), 
then v\u must be an eigenvector of p(ti), for all i. Since v{ Vi, this implies 
that u is an eigenvector p{ti), \/i. Since p is irreducible, this implies that either the 
elements ii, . . . ,ifc do not generate all of T, or p is 1-dimcnsional, in which case 
clearly p(ti) ~ pitj), which proves the theorem. □ 



Remark 9.2. As in Remark 7.4, the above argument also works if we pick all 
paths between the i-th and the j-th vertex of G, instead of all cycles. 
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10. Some perturbations and estimates 

Consider an analytic family of linear operators M{x), acting on R*"', with M{Q) — 
M, and let A be a simple eigenvalue of M. Then, if 

M{x) = M + M^^^x + Af(2).x2 + . . . , 



perturbation theory (see | Kato66 , page 79, (2.33)]) tells us that 

\{x) = A + A^i'a: + A(2)a;2 + . . . , 

where 

(46) AW =trM(i)PA, 

(47) A(2) =tr [A/(2)P;,-M(i)5aMWPa] , 

where Pa is the projection onto the eigenspace of A, while S\ is the reduced resolvent 
of M at A, which is the holomorphic part of the resolvent of M at A, defined by 
the properties 

(48) SxPx^PxSx^O; {M ~ XI)Sx ^ Sx{M - XI) = I - P^, 

(in other words, Sx is the inverse of M — AI restricted to the orthogonal complement 
of the eigenspace of A), and thus 

(49) M5a =I-Pa + A5a. 

Now we will specialize a bit: 

Assumption 1. The eigenvalue A is such that the constant vector 1 spans the 
eigenspace of A. 

In this case. Pa = Jk/k, where we recall that Jk is the k x k matrix of all Is. 
In addition. 

Assumption 2. We will assume that AI{x) — D{x)M, where D{x) is an analyt- 
ically varying diagonal matrix, D{x) = D + D^^^x + D'-^^x^ + . . . , where we say 
that the diagonal elements of I?^'^ are d*^'^ — {df\ . . . , d^''). 

Lemma 10.1. Let A — (Aij) be an n x n matrix. Then 

tr ^ J„ — ^ ^ ^ij ■ 

l<'i,j<n 

Lemma 10.2. Let A = (Aij) be an n x n matrix, and let X be an n x n diagonal 
matrix. Then 

(^XAX)ij = AijXiiXjj. 

Lemma 10.3. Let D be a diagonal matrix, with diagonal elements di,...,c?„. 

Then 

n 

v^Dv = ^ divf. 
The proofs of the above lemmas are immediate. 
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Lemma 10.4. Let Py is the projection operator on the subspace generated by v (a 
unit vector). Then 

tiMPy = v*Mv. 

In particular, if v is an eigenvector of AI with eigenvalue X, then tr AlPy — A||t;||. 
Proof. This follows by a direct computation, since when u is a unit vector, {Pv)ij — 

ViVj. □ 

Lemma 10.5. If v is an eigenvector of M with eigenvalue X, then MPy = AP„ 

Lemma 10.6. Suppose that X has multiplicity 1, and v{X) is a unit vector gener- 
ating the eigenspace of X, and M(t) — D{t)M, where D{t) is a diagonal matrix. 
Then 

X'{M) = Xv'{X)D'v. 
Proof. By Formula (^6|), we have 

A'(M) = tr M'P^(A) = v'{X)M'v{X) = Xv'{X)D'v. 

□ 

Corollary 10.7. In the case when v{X) = 1, we have: 

k 



(50) 



(1) 



To compute the second derivative of A, we use the formula ( p7[ ) (we are assuming 
that A is an isolated eigenvalue with eigenvector v{X), and M{t) — D{t)M, as 
before) : 

A" = tr [M"Py(^x) - M'SxM'Px] 
Xv^D"v - tr [M'SxM'Px] 
= Xv*D"v - Atr [D'MSxD'Px] 
= Aw* [D" - D'MSxD'] V. 

We can now use the formula (Eoh to get: 



(51) 



A" = Aw* [D" - D'{I - Px)D' - XD'SxD'] v. 



In the special case where the eigenvector v is proportional to 1, we can rewrite 
the formula in coordinates in a simple way. To wit, any diagonal matrix D can 
be written (uniquely) as Dq + dl, where Dq is such that tr Dq = 0. A simple 
computation then shows that 



(52) 



A" 



A 



(4)2 - Ad'*5Ad' 



The case we are interested in is still more special, and that is where 
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Assumption 3. 



/ exp(i/ia;) 



Dix) 



Here, d^) = (i/i,i/2, 

f = ifly fk), 



exp(z/2a;) 



,ifk), while d(2) 



exp{ifkx)J 
'^(/l I fi' ■ ■ ■ ^ fk) 



and so, letting 



(53) 



-iiifip 



Ifol 



where, as before, fo is the component of f orthogonal to constants. 
To show our final estimates we shall need 

Assumption 4. The matrix M is A > times a doubly stochastic matrix (this 
implies that the operator norm and the spectral radius of M are both equal to A) . 

Theorem 10.8. With assumptions as above, and, in addition, f = fo (that is 
Sj=i fj — ^Jj then A^^-* is nonpositive. 

Proof. Since i — io, Equation ( ^3|) can be rewritten as 



(54) 



A 



A 



A(') =-^[-||fo|p-2Af„^5Afo] --^[f^(-I-2A5A)fo] 

If we regard S\ as an operator on the orthogonal complement to 1 , then by equations 
(H) and dH), 5a(AIo - Mo) = -lo- Let v = -S'Afo- Then the term in square 
brackets in Eq. BJcan be rewritten as: 



(55) 



t;*(AIo - MoT (-1 - 2A5a) (AIq - Mo)v = vt {X^h - M*Mo) v, 



where we have used the fact that for any matrix A and any vector v, v^Av = 
v*A*v. The quadratic form A^Io — MqMq is positive semi-definite, since the biggest 
eigenvalue of the symmetric matrix MqMo is equal to the square of the operator 
norm of AIq, which, in turn, is no greater then A, by Assumption 4 (since M^M is 
A^ times a doubly stochastic matrix). □ 



Remark 10.9. In the statement of Theorem 10.3^ , the word "non-positive" can be 
improved to "negative " under the further assumption that M is irreducible, primi- 
tive, and normal. 

Proof. Since the orthogonal complement to the subspace generated by the vector 1 
is invariant under M , it follows that Mq is also normal, and so its operator norm is 
equal to its spectral radius Under the assuptions of irreducibility and primitivity, 
Perron-Frobenius theory tells us that < A. □ 

11. Topological entropy 

Consider a graph G, and consider a positive function / on its vertices. For each 
cycle c we let F{c) to be the sum of values of / over c, and we want to know how 
many c are there for which F{c) < L. We denote that number by N{f,L), and 
we ask ourselves how N{f, L) behaves asymptotically as L tends to infinity. To 
understand N{L, /), we consider first the matrix U{f) = D{u^^ , . . . , u^")A{G). As 
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before, we observe that the coefficient of in tr U'^{f) is the number of cycles of 
(combinatorial) length n, for which F{c) — r. Write a formal series 

L(/, I.) = ^tr {/"(/). 

n 

This series converges for sufficiently small u, and can there be written in closed 
form as L{f, u) = tr (I — U{f))^^, from which it follows that the exponential rate f 
growth of N{c) is equal to negative logarithm of the radius of convergence of L{f, u) 
- we call this the entropy of G, f - which, in term, is equal to the smallest positive 
real value of u, such that the spectral radius of U (/) is equal to 1. Since it is more 
convenient to deal with analytic functions (which £(/, u) is not, for arbitrary real 
values of fi, so we write u = exp— s, and now ask for the abscissa of convergence 
of i(/, exp— s). This will give us the entropy. In this section we use perturbation 
methods in a rather straightforward way to get explicit information on the entropy. 

Let A be an n X n non-negative primitive irreducible matrix. Let /i ,...,/„ be a 
collection of weights. We then define the matrix E{s, f ) to be the diagonal matrix 
whose ii-th. element is equal to exp(— s/i). Define M(s, f) to be M{s, f ) = E{s, f)A. 
We are interested in p(s,f): the spectral radius of M(s,f). By Perron- Frobenius 
theory we know that there is a real eigenvalue of M(s,s) equal to p{s,i), and the 
eigenvector Vp of this eigenvalue is positive. 



Lemma 11.1. 



(56) ^^=-pvlD{f,,...,f,,)v. 
For positive f , ^ < 0. 



Proof. This follows immediately from Lemma 10.4 and the positivity of p and Vp. 

□ 

Lemma 11.2. We have the following expression for the gradient of p with respect 
to f; 

(57) \/fp^-sp{vl...,vl), 
where Vp = {vi, . . . ,i;„). 



Proof. We note that 



dM 

— = -si?(0,...,l,...,0)M, 



where the 1 is in the i-th place. Thus, by formula ( p6| ) we have 

^f^ 

□ 



^~svlD{0,...,l,...,0)Mv = -spvf 



This can be restated as saying that the derivative of p in the direction of a vector 
g is equal to —psVpD{g)v. 

This gives us the following important corollary: 

Corollary 11.3. Consider deformations g keeping the sum of fi fixed. Then the 
critical points of p occur precisely for those p for which \vi\ = \vj\, for any i,j. 
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We can also compute the second directional derivative of p. Indeed, let g ~ 
(51, . . . ,gn) be the direction vector, so that we want to compute the second de- 
rivative with respect to t of p(s, f + tg) at t = 0. To do this, we use the formula 

(0): 

(58) p" = tr [Af "P,(p) - M'SpM'Pp] . 
Note that (as in the proof of Lemma 11. 2| ) 

(59) M' = -sD{gi,...,gn)M, 
while 

M'l = s^D{gl,...,gl)M, 

and so 
(60) 

trM"F,(p) = s^pv'Digl . . . , gl)v = s^p{D{gi, . . . ,.g„)4* {^^(51, • ■ • ,5„)w} • 

To understand the second term of the right-hand side of Eq. (|5^), first note that 
(by Eq. 



M'SpM'Pp = Digi, gn)MPp - ps''D{g^, gn)MSpD{g,, . . . , g^)Pp, 



, where the second equality is by Lemma 10.5. Now 

(61) tv M'SpM'Pp ^ p.s^v{pyD{gi,...,g„)MSpD{gu...,gn)v 

(62) = ps^ {Dig,, . . . , 5„)«}* MSp {D{g,, . . . , g„)v} . 

Putting together Eq. ( |60| ) and Eq. (|6^), we see that 

(63) p" - ps^ {D{gu . . . , g„)vy (I - M5p) . . . , gn)v} 
Using the formula (^) equation ( p3| ) simplifies further to: 

(64) p" ps" {D{gu . . . ,5„)«}* (P„(p) - pSp) {D{gu ■ ■ ■,gn)v} 
The following lemma is not surprising: 

Lemma 11.4. The quadratic form given Py — pSp is positive-definite. 

Proof. On the span of u, the projection operator P^ is equal to the identity, whilst 
the reduced resolvent Sp vanishes. On the orthogonal complement, the projec- 
tion operator vanishes, so since the Perron-Frobenius eigenvalue p is positive, we 
need to show that Sp is negative definite. Consider a vector w, in the orthogonal 
complement of v. Such a w is equal to {pi — M)z, for some z orthogonal to v. So, 

w*SpW = z\pl ~ M)z, 

So, it will suffice to show that {pi — M) is negative-definite. Suppose not. Then 
there exi sts a zq, such that ZqMzq > p||zo||^. By the argument in the proof of 
theorem 10.8 , we see that ||Mzo|| < pH^qH. So, ZqMzq > p\\zo\\^ implies that 
{zqjMzq) > pWzqW^, and hence that Zq is an eigenvactor of M with eigenvalue p, 
which is impossible by assumtion that M is irreducible and primitive. □ 

We finish with 
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Theorem 11.5. Let so(f) be the unique s such that p{so,{) is equal to 1. Then 
So is a convex function of f, and hence assumes a unique minimum on each linear 
subspace of values off. In particular, if we restrict to the the subspace Fq, where 
the sum of the values of off is equal to 1, then the minimum is achieved at the 
point where 

in which case the entropy is equal to ^log(j4l)i. 



Proof. The convexity of sg follows from Lemma 11.4 and Lemma |ll.l| . The point 



at which the minimum is achieved is computed easily using Corollary |ll.3 



as IS 

the value of entropy. □ 

12. Applications to Groups and other objects 

The asymptotic results in the previous sections apply directly to the question of 
the growth of homology classes in the free groups, and give in some sense complete 
information: 

Observation 12.1. We see that the asymptotic order of growth of any two fixed 
homology classes is the same. 



Observation 12.2. Theorem 7.1 shows in particular that a random long cycle is 



equidistributed among the vertices of a regular graph. 

Observation 12.3. We see that the order of growth the number of words length n 
in any fixed homology class in Fk is asymptotic to Ck{'2k — l)^/n^^'^, where Ck is 
easily computed using the expression for a in the statement of Theorem \5.^ , keeping 
in mind that 

k 

where c is the parameter in the statements of theorems of the last two sections. 



Alternately, Theorem 7.1 can be used. 
(c) We can compute other growth functions. For example, let h : Fn 7^ be the 
"total exponent" homomorphism, i.e. if Fn =< ai,...,a„ >, then h{ai) = 1. We 
see that the generating function for the preimages of j ^ Z is given by 
,k , n , , , s k , n 



(2V2^^) iife (-===; X,..., x) = (2 V2^^) Rk{ 
V Zn — 1 



Observation 12.4. Instead of cyclically reduced words, it is perhaps more natural 
to study conjugacy classes (ordered by their cyclically reduced length). It seems futile 



to seek any enumeration as neat as Theorem 2.5, however, since the relationship 
between the number Ck of conjugacy classes of words of length k and the number of 
cyclically reduced words Wk is: 

(66) " ^ + 0{VWk), 

it is clear that the asymptotic results are the same for the two problems. For more 
on this subject, see Section and the sequel. 



26 



IGOR RIVIN 



Observation 12.5. Counting conjugacy classes is a problem closely related to that 
of counting closed geodesies on manifold. In the context of compact hyperbolic 



surfaces, it was observed by P. Sarnak (see, for example, \RS94\) that among all 
geodesies shorter than L, null-homologous geodesies are more numerous than those 
in any other prescribed homology class ( that is, while the ratio of the two quantities 
approaches \, the difference is asymptotically positive). The results of the current 
note provide a certain justification for this, since any limiting distribution likely to 
arise in this context is, for reasons of symmetry, likely to be unimodal, with the 
mode at 0. Certainly this is true of the normal distribution, though even in this 
case, a careful analysis of the error terms is required. 

13. Counting conjugacy classes 

Consider a finitely presented group G. Let g be an element of G. We define 
the reduced length oi g - denoted by jg] - to be the length of the shortest word 
in the generators of G representing g. We define the length up to conjugacy of g 
- denoted by \g\c - to be the minimum of \h\, the minimum being taken over all 
group elements h conjugate to g. Length up to conjugacy is obviously invariant 
under conjugation, and we will also use the term to apply to conjugacy classes. 

MGir) = \{geG\ \g\^r}\, 
Cair) ^ \{g e Mair) \ \g\c ^ r}\ , 

CCair) = \{C e G/conjugacy | \C\c = r}\ . 

The subscript G will be omitted whenever the group G is obvious from context. 
Given a sequence A — oq, ... , a^, . . . , we can define a generating function J-IA], 

by 

oo 
i=0 

There is frequently confusion as to whether the generating function is a holomorphic 
function or an element of the ring of formal power series. In this section "generating 
function" will mean a function analytic at S C. 

The three counting functions above give rise to corresponding generating func- 
tions J-"[A/'g], J-"[Cg], T[CCg]- Our real interest will lie in the last of these; the first 
one has been the most extensively studied, and the result most relevant to us is: 

Fact 1. If G is an automatic group, then the generating function J^[A/'g] is a 

rational function. 

For definitions and properties of automatic groups, see | ECHLPT92t . 



Fact 2.(Gromov, Epstein) If G is an automatic group, then the generating function 
JFpc] is a rational function. 

Facts 1 and 2 might lead us to expect that JFpCc] is, likewise, rational, but in 
fact the opposite seems to be the case, and we are led to: 

Conjecture 13.1. Let G be a word-hyperbolic group. The J-[CCg] is rational if 
and only if G is virtually cyclic (elementary in the terminology of Gromov87| / 



In the sequel, this conjecture is supported by the complete analysis of the case 
where G is Ffc - the free group on k generators. 
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14. Growth functions for free groups 
Let Fk be the free group on k generators. The following is obvious: 
Fact 3. 7V>Jr) = 2k{2k - 



Theorem 1.1 says that 

CF,{r) = {2k-lY + l + {k-l)[l + {-lY]. 

Corollary 14.1. 

1 — [2k — l)z 1 ~ z 1 — 
In order to compute CCp^ir) it is enough to notice the following: 
Theorem 14.2. 

d r 



where <f> denotes the Euler totient function. 

Proof. The theorem is a trivial consequence of Burnside's lemma, stated below as 



Theorem [14. 3| for convenience, applied to the action of the cyclic group Z/(rZ) on 



the set of cyclically reduced words of length r. □ 

Theorem 14.3. Let G he a finite group acting on a finite set X . For g (z G let 

ip{g) denote the number of x G X , such that g{x) = x. Then the number of orbits 
of X under the G-action is 

We now have the following general observation: 

Theorem 14.4. Suppose we have three sequences A — {oi}, B = {bj}, and C = 
{cfe}, satisfying 

a^^^Cdb^. 



Then 



T[A]{z)^Y.'"^T[B]{x''). 

d=l 

Proof. On the level of formal power series, the statement is clear by expanding 
the left hand side. Otherwise, if the radius of convergence of J-[A\ is r^, then the 
radius of convergence of Gd[A], defined as G'd[^](2;) = jF[y4](z'') is, by Hadamard's 
criterion, equal to r]J'^ , so all of Gd [A\ converge on the disk of radius Ra ~ min(ra , 1) 
around the origin. Since the series on the right hand side converges at (since all 
the terms vanish), it converges uniformly on compact subsets of the disk of radius 
Ra around the origin. □ 
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Corollary 14.5. Let TL he the generating Junction of the sequence hr — rCC{r). 
Then 

oo 

W(^) = l + ^0(d)^[C](z'^). 
We can combine all of the above results into the following conclusion: 



Theorem 14.6. The generating function Ti as in the statement of corollary 14^ 
can be expanded as: 

2 / -1 
n = l + (k- l)—^-— + V M) i ; :—j - 1 



In particular, TL has an infinite number of poles, and is not a rational function. 
The generating function J-[CCFk\ can be written as 



T[CCf,]{z) = ^ 
and so is not a rational function either. 



-dt, 



Proof. The expression for Ti is fairly obvious, with the comment that the second 
summand is a consequence of the fact that 

That Ti. has an infinite number of poles follows from the observation that the c?-th 
term in the third summand has its d poles on the circle \z\ = (2fc— 1)^^/'', while the 
first two summands are analytic in the open unit disk. The expression for J^lCCp,^] 
is immediate. □ 



Remark. Various people, when shown Theorem 14.6, appeared to believe that it 



contradicts Gromov87 Theorem 5. 2D]. In fact (as pointed out by Greg McShane), 
Gromov's function [N]k is not (as the common misunderstanding has it) the same 
as CCair) in the case of a free group, but is the same as Coir). 

14.1. Some further comments. The following observation is quite obvious: 

Observation 14.7. Let Gi and G2 be two groups. Then, 



Observation 14.7 has some consequences: 



Theorem 14.8. Let Gi and G2 be two groups, then if J-[CCgi] is rational, while 
J-[CCg2\ not, then J-[CCgixG2] ""^^t rational. If both J-[CCgi] and J-[CCg2] o,re 
rational, then so is !F[CCgi-kG2]- 

Corollary 14.9. IfGi = Z" and G2 is a finite group, then !F[CCgikG2] rational. 

Remark. It is not clear whether T[CCg\ is rational when G is a Bieberbach group 
- most likely this depends on the choice of the generating set, as conjectured by 
D. B. A. Epstein. 

Corollary 14.10. // Gi ~ Fk and G2 is a direct product of finite groups and 
infinite cyclic groups, then T[CCgixG2] irrational. 
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Theorem 14.11. If G ~ Fk^ x x ... x F^^, then !F[CCg] is irrational (with 
respect to the "obvious" generating set). 



Proof. This is an immediate consequence of Theorem 14.6. □ 



15. Primitive conjugacy class zeta function 

One can compute a zeta-function analogous to that of lhara for the numbers of 
primitive conjugacy classes of a given length (a primitive class is one which is not 
the power of a smaller class) , using, essentially, the elementary method described 
by Stark and Terras, [3T96|, as applied to the graph constructed in Section 
This function turns out to be rational (in fact, there is a simple formula for it, see 
Theorem 15.1). More precisely, consider 

(67) CiG)-' ^Hil + u\c)), 

where [c] denotes the equivalences classes of primitive cycles, where two cycles are 
considered equivalent if one can be obtained from the other by a rotation. 
A computation then shows that 

(68) CiFr) = (1 - u^Y-\l - u){l - (2r - l)u). 

The computation goes as follows: 
First, note that 



oo 

(69) logC(G)^^5]- 



I 

[c] 1=1 



and thus 



du 

c 1=1 



The above can be rewritten (note that the sum is now over primtive cycles, and 
not equivalence classes thereof): 



(71) -^^^^I^^EE""'^^ 

c i=l 



du 



But note that the right hand side is simply the ordinary generating function for all 
cycles: 



(72) u ^2^N,u, 



du 



i=l 



where Ni is the number of cycles of length i in G, and this generating function was 
computed in Section |l|: 



oo 

(73) T._^^^' = YTi^r 



+ {2r-l)u l-u l + u' 

The formula ^ now follows by a straightforward integration. 

An quick examination of the above argument shows that the formula ^ is a 
special case of the following result: 
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Theorem 15.1. Let G be a finite graph, and let C,q he the zeta Junction defined by 
formula Wl. Let A{G) be the adjacency matrix of G. Then 



(74) 



Cg(") =det {I~uA{G)). 



In other words, the zeta function is essentially the characteristic polynomial of 
A{G). 

Proof. The argument above up to Equation ^ is completely general. On the other 
hand, the right hand side of equation (721 can be rewritten as: 



= ESitrA(G)V 



Thus, 



d log C(G) 



du 



= tr [-i+y:t=o[a{gh 

= tr [-I+{I-uA{G))-^] 
tr {uA{G){I ^ uA{G))-^) 

= tr {AiG){I~uA{G))-'] 



, and so it follows that 



C(G) =Gdet(/-iiA(G)), 



where G is a constant of integration, seen to be equal to 1 by computing both sides 
at u = 0. □ 
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